39 research outputs found

    Blending big data analytics : review on challenges and a recent study

    Get PDF
    With the collection of massive amounts of data every day, big data analytics has emerged as an important trend for many organizations. These collected data can contain important information that may be key to solving wide-ranging problems, such as cyber security, marketing, healthcare, and fraud. To analyze their large volumes of data for business analyses and decisions, large companies, such as Facebook and Google, adopt analytics. Such analyses and decisions impact existing and future technology. In this paper, we explore how big data analytics is utilized as a technique for solving problems of complex and unstructured data using such technologies as Hadoop, Spark, and MapReduce. We also discuss the data challenges introduced by big data according to the literature, including its six V's. Moreover, we investigate case studies of big data analytics on various techniques of such analytics, namely, text, voice, video, and network analytics. We conclude that big data analytics can bring positive changes in many fields, such as education, military, healthcare, politics, business, agriculture, banking, and marketing, in the future. © 2013 IEEE

    Real-time big data processing for anomaly detection : a survey

    Get PDF
    The advent of connected devices and omnipresence of Internet have paved way for intruders to attack networks, which leads to cyber-attack, financial loss, information theft in healthcare, and cyber war. Hence, network security analytics has become an important area of concern and has gained intensive attention among researchers, off late, specifically in the domain of anomaly detection in network, which is considered crucial for network security. However, preliminary investigations have revealed that the existing approaches to detect anomalies in network are not effective enough, particularly to detect them in real time. The reason for the inefficacy of current approaches is mainly due the amassment of massive volumes of data though the connected devices. Therefore, it is crucial to propose a framework that effectively handles real time big data processing and detect anomalies in networks. In this regard, this paper attempts to address the issue of detecting anomalies in real time. Respectively, this paper has surveyed the state-of-the-art real-time big data processing technologies related to anomaly detection and the vital characteristics of associated machine learning algorithms. This paper begins with the explanation of essential contexts and taxonomy of real-time big data processing, anomalous detection, and machine learning algorithms, followed by the review of big data processing technologies. Finally, the identified research challenges of real-time big data processing in anomaly detection are discussed. © 2018 Elsevier Lt

    Profiling users' behavior, and identifying important features of review 'helpfulness'

    Get PDF
    The increasing volume of online reviews and the use of review platforms leave tracks that can be used to explore interesting patterns. It is in the primary interest of businesses to retain and improve their reputation. Reviewers, on the other hand, tend to write reviews that can influence and attract people’s attention, which often leads to deliberate deviations from past rating behavior. Until now, very limited studies have attempted to explore the impact of user rating behavior on review helpfulness. However, there are more perspectives of user behavior in selecting and rating businesses that still need to be investigated. Moreover, previous studies gave more attention to the review features and reported inconsistent findings on the importance of the features. To fill this gap, we introduce new and modify existing business and reviewer features and propose a user-focused mechanism for review selection. This study aims to investigate and report changes in business reputation, user choice, and rating behavior through descriptive and comparative analysis. Furthermore, the relevance of various features for review helpfulness is identified by correlation, linear regression, and negative binomial regression. The analysis performed on the Yelp dataset shows that the reputation of the businesses has changed slightly over time. Moreover, 46% of the users chose a business with a minimum of 4 stars. The majority of users give 4-star ratings, and 60% of reviewers adopt irregular rating behavior. Our results show a slight improvement by using user rating behavior and choice features. Whereas, the significant increase in R2 indicates the importance of reviewer popularity and experience features. The overall results show that the most significant features of review helpfulness are average user helpfulness, number of user reviews, average business helpfulness, and review length. The outcomes of this study provide important theoretical and practical implications for researchers, businesses, and reviewers

    The role of big data in smart city

    No full text
    The expansion of big data and the evolution of Internet of Things (IoT) technologies have played an important role in the feasibility of smart city initiatives. Big data offer the potential for cities to obtain valuable insights from a large amount of data collected through various sources, and the IoT allows the integration of sensors, radio-frequency identification, and Bluetooth in the real-world environment using highly networked services. The combination of the IoT and big data is an unexplored research area that has brought new and interesting challenges for achieving the goal of future smart cities. These new challenges focus primarily on problems related to business and technology that enable cities to actualize the vision, principles, and requirements of the applications of smart cities by realizing the main smart environment characteristics. In this paper, we describe the existing communication technologies and smart-based applications used within the context of smart cities. The visions of big data analytics to support smart cities are discussed by focusing on how big data can fundamentally change urban populations at different levels. Moreover, a future business model that can manage big data for smart cities is proposed, and the business and technological research challenges are identified. This study can serve as a benchmark for researchers and industries for the future progress and development of smart cities in the context of big data

    A Survey on Underwater Wireless Sensor Networks: Requirements, Taxonomy, Recent Advances, and Open Research Challenges

    Get PDF
    The domain of underwater wireless sensor networks (UWSNs) had received a lot of attention recently due to its significant advanced capabilities in the ocean surveillance, marine monitoring and application deployment for detecting underwater targets. However, the literature have not compiled the state-of-the-art along its direction to discover the recent advancements which were fuelled by the underwater sensor technologies. Hence, this paper offers the newest analysis on the available evidences by reviewing studies in the past five years on various aspects that support network activities and applications in UWSN environments. This work was motivated by the need for robust and flexible solutions that can satisfy the requirements for the rapid development of the underwater wireless sensor networks. This paper identifies the key requirements for achieving essential services as well as common platforms for UWSN. It also contributes a taxonomy of the critical elements in UWSNs by devising a classification on architectural elements, communications, routing protocol and standards, security, and applications of UWSNs. Finally, the major challenges that remain open are presented as a guide for future research directions

    JQPro : Join query processing in a distributed system for big RDF data using the hash-merge join technique

    Get PDF
    In the last decade, the volume of semantic data has increased exponentially, with the number of Resource Description Framework (RDF) datasets exceeding trillions of triples in RDF repositories. Hence, the size of RDF datasets continues to grow. However, with the increasing number of RDF triples, complex multiple RDF queries are becoming a significant demand. Sometimes, such complex queries produce many common sub-expressions in a single query or over multiple queries running as a batch. In addition, it is also difficult to minimize the number of RDF queries and processing time for a large amount of related data in a typical distributed environment encounter. To address this complication, we introduce a join query processing model for big RDF data, called JQPro. By adopting a MapReduce framework in JQPro, we developed three new algorithms, which are hash-join, sort-merge, and enhanced MapReduce-join for join query processing of RDF data. Based on an experiment conducted, the result showed that the JQPro model outperformed the two popular algorithms, gStore and RDF-3X, with respect to the average execution time. Furthermore, the JQPro model was also tested against RDF-3X, RDFox, and PARJs using the LUBM benchmark. The result showed that the JQPro model had better performance in comparison with the other models. In conclusion, the findings showed that JQPro achieved improved performance with 87.77% in terms of execution time. Hence, in comparison with the selected models, JQPro performs better

    Indigenous food recognition model based on various convolutional neural network architectures for gastronomic tourism business analytics

    Get PDF
    In gastronomic tourism, food is viewed as the central tourist attraction. Specifically, indigenous food is known to represent the expression of local culture and identity. To promote gastronomic tourism, it is critical to have a model for the food business analytics system. This research undertakes an empirical evaluation of recent transfer learning models for deep learning feature extraction for a food recognition model. The VIREO-Food172 Dataset and a newly established Sabah Food Dataset are used to evaluate the food recognition model. Afterwards, the model is implemented into a web application system as an attempt to automate food recognition. In this model, a fully connected layer with 11 and 10 Softmax neurons is used as the classifier for food categories in both datasets. Six pre-trained Convolutional Neural Network (CNN) models are evaluated as the feature extractors to extract essential features from food images. From the evaluation, the research found that the EfficientNet feature extractor-based and CNN classifier achieved the highest classification accuracy of 94.01% on the Sabah Food Dataset and 86.57% on VIREO-Food172 Dataset. EFFNet as a feature representation outperformed Xception in terms of overall performance. However, Xception can be considered despite some accuracy performance drawback if computational speed and memory space usage are more important than performance

    Credit card default prediction using machine learning techniques

    Get PDF
    Credit risk plays a major role in the banking industry business. Banks' main activities involve granting loan, credit card, investment, mortgage, and others. Credit card has been one of the most booming financial services by banks over the past years. However, with the growing number of credit card users, banks have been facing an escalating credit card default rate. As such data analytics can provide solutions to tackle the current phenomenon and management credit risks. This paper provides a performance evaluation of credit card default prediction. Thus, logistic regression, rpart decision tree, and random forest are used to test the variable in predicting credit default and random forest proved to have the higher accuracy and area under the curve. This result shows that random forest best describe which factors should be considered with an accuracy of 82 % and an Area under Curve of 77 % when assessing the credit risk of credit card customers

    Distributed Join Query Processing for Big RDF Data

    Get PDF
    The expansion of the services of the Semantic Web and the evolution of cloud computing technologies have significantly enhanced the capability of preserving and publishing information in standard open web formats, such that data can be both human-readable and machine-processable. This situation meets the challenge in the current big data era to effectively store, retrieve, and analyze resource description framework (RDF) data in swarms. Moreover, efficient data storage and retrieval that can scale to large amounts of possibly schema-less data have proven to be quite difficult to achieve, specifically, RDF data storage with complex and large graph patterns for representing semantic data, and SPARQL query languages. In this paper, we provide comprehensive discussion about the proposed algorithms of Join.Query processing of RDF data by considering MapReduce Framework in a distributed environment. Moreover, we introduced a framework for RDF query processing and the benchmark that is used for the performance evaluation. Finally, we offer an evaluation discussion on distributed join query processing for big RDF data
    corecore